9 research outputs found

    Characterizing model uncertainty in ensemble learning

    Get PDF

    Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning Algorithms

    Full text link
    We propose a framework for descriptively analyzing sets of partial orders based on the concept of depth functions. Despite intensive studies of depth functions in linear and metric spaces, there is very little discussion on depth functions for non-standard data types such as partial orders. We introduce an adaptation of the well-known simplicial depth to the set of all partial orders, the union-free generic (ufg) depth. Moreover, we utilize our ufg depth for a comparison of machine learning algorithms based on multidimensional performance measures. Concretely, we analyze the distribution of different classifier performances over a sample of standard benchmark data sets. Our results promisingly demonstrate that our approach differs substantially from existing benchmarking approaches and, therefore, adds a new perspective to the vivid debate on the comparison of classifiers.Comment: Accepted to ISIPTA 2023; Forthcoming in: Proceedings of Machine Learning Researc

    Not All Data Are Created Equal: Lessons From Sampling Theory For Adaptive Machine Learning

    Get PDF
    In survey methodology, inverse probability weighted (Horvitz-Thompson) estimation has become an indispensable part of statistical inference. This is triggered by the need to deal with complex samples, that is, non-identically distributed data. The general idea is that weighting observations inversely to their probability of being included in the sample produces unbiased estimators with reduced variance. In this work, we argue that complex samples are subtly ubiquitous in two promising subfields of data science: Self-Training in Semi-Supervised Learning (SSL) and Bayesian Optimization (BO). Both methods rely on refitting learners to artificially enhanced training data. These enhancements are based on pre-defined criteria to select data points rendering some data more likely to be added than others. We experimentally analyze the distance from the so-produced complex samples to i.i.d. samples by Kullback-Leibler divergence and maximum mean discrepancy. What is more, we propose to handle such samples by inverse probability weighting. This requires estimation of inclusion probabilities. Unlike for some observational survey data, however, this is not a major issue since we excitingly have tons of explicit information on the inclusion mechanism. After all, we generate the data ourselves by means of the selection criteria. To make things more tangible, consider the case of BO first. It optimizes an unknown function by iteratively approximating it through a surrogate model, whose mean and standard error estimates are scalarized to a selection criterion. The arguments of this criterion's optima are evaluated and added to the training data. We propose to weight them by means of the surrogate model's standard errors at time of selection. For the case of deploying random forests as surrogate models, we refit them by weighted drawing in the bootstrap sampling step. Refitting may be done iteratively aiming at speeding up the optimization or after convergence aiming at providing applicants with a (global) interpretable surrogate model. Similarly, self-training in SSL selects instances from a set of unlabeled data, predicts its labels and adds these pseudo-labeled data to the training data. Instances are selected according to a confidence measure, e.g. the predictive variance. Regions in the feature space where the model is very confident are thus over-represented in the selected sample. We again explicitly exploit the selection criteria to define weights which we use for resampling-based refitting of the model. Somewhat counter-intuitively, the more confident the model is in the self-assigned labels, the lower their weights should be to counteract the selection bias. Preliminary results suggest this can increase generalization performance

    Evaluating machine learning models in non-standard settings: An overview and new findings

    Full text link
    Estimating the generalization error (GE) of machine learning models is fundamental, with resampling methods being the most common approach. However, in non-standard settings, particularly those where observations are not independently and identically distributed, resampling using simple random data divisions may lead to biased GE estimates. This paper strives to present well-grounded guidelines for GE estimation in various such non-standard settings: clustered data, spatial data, unequal sampling probabilities, concept drift, and hierarchically structured outcomes. Our overview combines well-established methodologies with other existing methods that, to our knowledge, have not been frequently considered in these particular settings. A unifying principle among these techniques is that the test data used in each iteration of the resampling procedure should reflect the new observations to which the model will be applied, while the training data should be representative of the entire data set used to obtain the final model. Beyond providing an overview, we address literature gaps by conducting simulation studies. These studies assess the necessity of using GE-estimation methods tailored to the respective setting. Our findings corroborate the concern that standard resampling methods often yield biased GE estimates in non-standard settings, underscoring the importance of tailored GE estimation

    Horseshoe RuleFit : Learning Rule Ensembles via Bayesian Regularization

    No full text
    This work proposes Hs-RuleFit, a learning method for regression and classification, which combines rule ensemble learning based on the RuleFit algorithm with Bayesian regularization through the horseshoe prior. To this end theoretical properties and potential problems of this combination are studied. A second step is the implementation, which utilizes recent sampling schemes to make the Hs-RuleFit computationally feasible. Additionally, changes to the RuleFit algorithm are proposed such as Decision Rule post-processing and the usage of Decision rules generated via Random Forest. Hs-RuleFit addresses the problem of finding highly accurate and yet interpretable models. The method shows to be capable of finding compact sets of informative decision rules that give a good insight in the data. Through the careful choice of prior distributions the horse-shoe prior shows to be superior to the Lasso in this context. In an empirical evaluation on 16 real data sets Hs-RuleFit shows excellent performance in regression and outperforms the popular methods Random Forest, BART and RuleFit in terms of prediction error. The interpretability is demonstrated on selected data sets. This makes the Hs-RuleFit a good choice for science domains in which interpretability is desired. Problems are found in classification, regarding the usage of the horseshoe prior and rule ensemble learning in general. A simulation study is performed to isolate the problems and potential solutions are discussed. Arguments are presented, that the horseshoe prior could be a good choice in other machine learning areas, such as artificial neural networks and support vector machines

    Horseshoe RuleFit : Learning Rule Ensembles via Bayesian Regularization

    No full text
    This work proposes Hs-RuleFit, a learning method for regression and classification, which combines rule ensemble learning based on the RuleFit algorithm with Bayesian regularization through the horseshoe prior. To this end theoretical properties and potential problems of this combination are studied. A second step is the implementation, which utilizes recent sampling schemes to make the Hs-RuleFit computationally feasible. Additionally, changes to the RuleFit algorithm are proposed such as Decision Rule post-processing and the usage of Decision rules generated via Random Forest. Hs-RuleFit addresses the problem of finding highly accurate and yet interpretable models. The method shows to be capable of finding compact sets of informative decision rules that give a good insight in the data. Through the careful choice of prior distributions the horse-shoe prior shows to be superior to the Lasso in this context. In an empirical evaluation on 16 real data sets Hs-RuleFit shows excellent performance in regression and outperforms the popular methods Random Forest, BART and RuleFit in terms of prediction error. The interpretability is demonstrated on selected data sets. This makes the Hs-RuleFit a good choice for science domains in which interpretability is desired. Problems are found in classification, regarding the usage of the horseshoe prior and rule ensemble learning in general. A simulation study is performed to isolate the problems and potential solutions are discussed. Arguments are presented, that the horseshoe prior could be a good choice in other machine learning areas, such as artificial neural networks and support vector machines

    Cultivated Random Forests: Robust Decision Tree Learning through Tree Structured Ensembles

    Get PDF
    We propose a robust decision tree induction method that mitigates the problems of instability and poor generalization on unseen data. In the spirit of model imprecision and robust statistics, we generalize decision trees by replacing internal nodes with two types of ensemble modules that pool together a set of decisions into a soft decision: (1) option modules consisting of all reasonable variable choices at each step of the induction process, (2) robust split modules including all elements of a neighbourhood of an optimal split-point as reasonable alternative split-points. We call the resulting set of trees cultivated random forest as it corresponds to an ensemble of trees which is centered around a single tree structure, alleviating the loss of interpretability of traditional ensemble methods. The explicit modelling of nonprobabilistic uncertainty about the tree structure also provides an estimate of the reliability of predictions, allowing to abstain from predictions when the uncertainty is too high. On a variety of benchmark datasets, we show that our method is often competitive with random forests, while being structurally substantially simpler and easier to interpret

    Discriminative Power Lasso -- Incorporating Discriminative Power of Genes into Regularization-Based Variable Selection

    Get PDF
    In precision medicine, it is known that specific genes are decisive for the development of different cell types. In drug development it is therefore of high relevance to identify biomarkers that allow to distinguish cell-subtypes that are connected to a disease. The main goal is to find a sparse set of genes that can be used for prediction. For standard classification methods the high dimensionality of gene expression data poses a severe challenge. Common approaches address this problem by excluding genes during preprocessing. As an alternative, L1-regularized regression (Lasso) can be used in order to identify the most impactful genes. We argue to use an adaptive penalization scheme, based on the biological insight that decisive genes are expressed differently among the cell types. The differences in gene expression are measured as their discriminitive power (DP), which is based on the univariate compactness within classes and separation between classes. ANOVA based measures, as well as measures coming from clustering theory, are applied to construct the covariate specific DP. The resulting model, that we call Discriminative Power Lasso (DP-Lasso), incorporates the DP as covariate specific penalization into the Lasso. Genes with a higher DP are penalized less heavily and have a higher chance for being part of the final model. With that the model can be guided towards more promising and trustworthy genes, while the coefficients of uninformative genes can be shrunken to zero more reliably. We test our method on single-cell RNA-sequencing data as well as on simulated data. DP-Lasso leads on average to significantly sparser solutions compared to competing Lasso-based regularization approaches, while being competitive in terms of accuracy

    A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses

    Get PDF
    Seibold H, Czerny S, Decke S, et al. A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses. PLoS ONE . 2021;16(6): e0251194.Computational reproducibility is a corner stone for sound and credible research. Especially in complex statistical analyses-such as the analysis of longitudinal data-reproducing results is far from simple, especially if no source code is available. In this work we aimed to reproduce analyses of longitudinal data of 11 articles published in PLOS ONE. Inclusion criteria were the availability of data and author consent. We investigated the types of methods and software used and whether we were able to reproduce the data analysis using open source software. Most articles provided overview tables and simple visualisations. Generalised Estimating Equations (GEEs) were the most popular statistical models among the selected articles. Only one article used open source software and only one published part of the analysis code. Replication was difficult in most cases and required reverse engineering of results or contacting the authors. For three articles we were not able to reproduce the results, for another two only parts of them. For all but two articles we had to contact the authors to be able to reproduce the results. Our main learning is that reproducing papers is difficult if no code is supplied and leads to a high burden for those conducting the reproductions. Open data policies in journals are good, but to truly boost reproducibility we suggest adding open code policies
    corecore